Availability Modeling and Evaluation on High Performance Cluster Computing Systems
نویسندگان
چکیده
Cluster computing has been attracting more and more attention from both the industrial and the academic world for its enormous computing power and scalability. Beowulf type cluster, for example, is a typical High Performance Computing (HPC) cluster system. Availability, as a key attribute of the system, needs to be considered at the system design stage and monitored at mission time. Moreover, system monitoring is a must to help identify the defects and ensure the system’s availability requirement. In this paper, novel solutions which provide availability modeling, model evaluation, and data analysis as a single framework have been investigated. Three key components in the investigation are availability modeling, model evaluation, and data analysis. The general availability concepts and modeling techniques are briefly reviewed. The system’s availability model is divided into submodels based upon their functionalities. Furthermore, an object oriented Markov model specification to facilitate availability modeling and runtime configuration has been developed. Numerical solutions for Markov models are examined, especially on the uniformization method. The paper also presents a monitoring and data analysis framework, which is responsible for failure analysis and availability reconfiguration. ACM Classification: D.2.11, D.2.12, D.2.13
منابع مشابه
Green Energy-aware task scheduling using the DVFS technique in Cloud Computing
Nowdays, energy consumption as a critical issue in distributed computing systems with high performance has become so green computing tries to energy consumption, carbon footprint and CO2 emissions in high performance computing systems (HPCs) such as clusters, Grid and Cloud that a large number of parallel. Reducing energy consumption for high end computing can bring various benefits such as red...
متن کاملThe Modeling and Dependability Analysis of High Availability OSCAR Cluster System
OSCAR is widely used for building and maintaining a high-performance parallel computing system. In many cases, high availability requirement becomes as critical as high performance. In this paper, the current OSCAR cluster system is introduced. Some high availability consideration is discussed and the high availability OSCAR cluster system is presented. Continuous Time Markov Chain models are b...
متن کاملA New Availability Concept for (n, k)-way Cluster Systems Regarding Waiting Time
It is necessary to have the precise definition of available performance of high availability systems that can represent the availability and performability of the systems altogether. However, the difference between numeric scales of availability and performance metrics such as waiting time makes quantitative evaluation difficult. A number of previous studies on availability do not include a per...
متن کاملParallel computing using MPI and OpenMP on self-configured platform, UMZHPC.
Parallel computing is a topic of interest for a broad scientific community since it facilitates many time-consuming algorithms in different application domains.In this paper, we introduce a novel platform for parallel computing by using MPI and OpenMP programming languages based on set of networked PCs. UMZHPC is a free Linux-based parallel computing infrastructure that has been developed to cr...
متن کاملA Clustering Approach to Scientific Workflow Scheduling on the Cloud with Deadline and Cost Constraints
One of the main features of High Throughput Computing systems is the availability of high power processing resources. Cloud Computing systems can offer these features through concepts like Pay-Per-Use and Quality of Service (QoS) over the Internet. Many applications in Cloud computing are represented by workflows. Quality of Service is one of the most important challenges in the context of sche...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Journal of Research and Practice in Information Technology
دوره 38 شماره
صفحات -
تاریخ انتشار 2006